moore machine
A "good regulator theorem" for embodied agents
Virgo, Nathaniel, Biehl, Martin, Baltieri, Manuel, Capucci, Matteo
In a classic paper, Conant and Ashby claimed that "every good regulator of a system must be a model of that system." Artificial Life has produced many examples of systems that perform tasks with apparently no model in sight; these suggest Conant and Ashby's theorem doesn't easily generalise beyond its restricted setup. Nevertheless, here we show that a similar intuition can be fleshed out in a different way: whenever an agent is able to perform a regulation task, it is possible for an observer to interpret it as having "beliefs" about its environment, which it "updates" in response to sensory input. This notion of belief updating provides a notion of model that is more sophisticated than Conant and Ashby's, as well as a theorem that is more broadly applicable. However, it necessitates a change in perspective, in that the observer plays an essential role in the theory: models are not a mere property of the system but are imposed on it from outside. Our theorem holds regardless of whether the system is regulating its environment in a classic control theory setup, or whether it's regulating its own internal state; the model is of its environment either way. The model might be trivial, however, and this is how the apparent counterexamples are resolved.
Extracting Finite State Machines from Transformers
Fueled by the popularity of the transformer architecture in deep learning, several works have investigated what formal languages a transformer can learn. Nonetheless, existing results remain hard to compare and a fine-grained understanding of the trainability of transformers on regular languages is still lacking. We investigate transformers trained on regular languages from a mechanistic interpretability perspective. Using an extension of the $L^*$ algorithm, we extract Moore machines from transformers. We empirically find tighter lower bounds on the trainability of transformers, when a finite number of symbols determine the state. Additionally, our mechanistic insight allows us to characterise the regular languages a one-layer transformer can learn with good length generalisation. However, we also identify failure cases where the determining symbols get misrecognised due to saturation of the attention mechanism.
Neural Reward Machines
Umili, Elena, Argenziano, Francesco, Capobianco, Roberto
Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use symbolic formalisms (as Linear Temporal Logic or automata) to specify the temporally-extended task. These approaches only work in finite and discrete state environments or continuous problems for which a mapping between the raw state and a symbolic interpretation is known as a symbol grounding (SG) function. Here, we define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic non-markovian RL domains, which is based on the probabilistic relaxation of Moore Machines. We combine RL with semisupervised symbol grounding (SSSG) and we show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge. Moreover, we advance the research in SSSG, proposing an algorithm for analysing the groundability of temporal specifications, which is more efficient than baseline techniques of a factor $10^3$.
Learning Reward Machines through Preference Queries over Sequences
Hsiung, Eric, Biswas, Joydeep, Chaudhuri, Swarat
Reward machines have shown great promise at capturing non-Markovian reward functions for learning tasks that involve complex action sequencing. However, no algorithm currently exists for learning reward machines with realistic weak feedback in the form of preferences. We contribute REMAP, a novel algorithm for learning reward machines from preferences, with correctness and termination guarantees. REMAP introduces preference queries in place of membership queries in the L* algorithm, and leverages a symbolic observation table along with unification and constraint solving to narrow the hypothesis reward machine search space. In addition to the proofs of correctness and termination for REMAP, we present empirical evidence measuring correctness: how frequently the resulting reward machine is isomorphic under a consistent yet inexact teacher, and the regret between the ground truth and learned reward machines.
Towards Partial Monitoring: It is Always too Soon to Give Up
Ferrando, Angelo, Cardoso, Rafael C.
Runtime Verification is a lightweight formal verification technique. It is used to verify at runtime whether the system under analysis behaves as expected. The expected behaviour is usually formally specified by means of properties, which are used to automatically synthesise monitors. A monitor is a device that, given a sequence of events representing a system execution, returns a verdict symbolising the satisfaction or violation of the formal property. Properties that can (resp. cannot) be verified at runtime by a monitor are called monitorable and non-monitorable, respectively. In this paper, we revise the notion of monitorability from a practical perspective, where we show how non-monitorable properties can still be used to generate partial monitors, which can partially check the properties. Finally, we present the implications both from a theoretical and practical perspectives.
Reservoir Memory Machines as Neural Computers
Paaßen, Benjamin, Schulz, Alexander, Stewart, Terrence C., Hammer, Barbara
Differentiable neural computers extend artificial neural networks with an explicit memory without interference, thus enabling the model to perform classic computation tasks such as graph traversal. However, such models are difficult to train, requiring long training times and large datasets. In this work, we achieve some of the computational capabilities of differentiable neural computers with a model that can be trained extremely efficiently, namely an echo state network with an explicit memory without interference. This extension raises the computation power of echo state networks from strictly less than finite state machines to strictly more than finite state machines. Further, we demonstrate experimentally that our model performs comparably to its fully-trained deep version on several typical benchmark tasks for differentiable neural computers.
Learning Finite State Representations of Recurrent Policy Networks
Koul, Anurag, Greydanus, Sam, Fern, Alan
Recurrent neural networks (RNNs) are an effective representation of control policies for a wide range of reinforcement and imitation learning problems. RNN policies, however, are particularly difficult to explain, understand, and analyze due to their use of continuous-valued memory vectors and observation features. In this paper, we introduce a new technique, Quantized Bottleneck Insertion, to learn finite representations of these vectors and features. The result is a quantized representation of the RNN that can be analyzed to improve our understanding of memory use and general behavior. We present results of this approach on synthetic environments and six Atari games. The resulting finite representations are surprisingly small in some cases, using as few as 3 discrete memory states and 10 observations for a perfect Pong policy. We also show that these finite policy representations lead to improved interpretability.
Activity Recognition Through Complex Event Processing: First Findings
Hallé, Sylvain (Université du Québec à Chicoutimi) | Gaboury, Sébastien (Université du Québec à Chicoutimi) | Bouchard, Bruno (Université du Québec à Chicoutimi)
The activities of daily living of a patient in a smart home environment can be detected to a large extent by the real-time analysis of characteristics of the habitat's electrical consumption. However, reasoning over the conduct of these activities occurs at a much higher level of abstraction than what the sensors generally produce. In this paper, we leverage the concept of Complex Event Processing (CEP), in which low-level data streams are progressively transformed into higher-level ones, to the task of activity recognition. We show how the use of an appropriate representation for each level of abstraction can greatly simplify the process. We also report on the use of an existing event stream processor to successfully implement the complete chain, from low-level sensor data up to a sequence of discrete and high-level actions.
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Amato, Christopher (University of Massachusetts, Amherst) | Bonet, Blai (Universidad Simón Bolívar) | Zilberstein, Shlomo (University of Massachusetts, Amherst)
Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones.